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Abstract. Consider an unlimited homogeneous medium disturbed by points generated via Poisson pro¬ 
cess. The neighborhood of a point plays an important role in spatial statistics problems. Here, we obtain 
analytically the distance statistics to kth nearest neighbor in a d-dimensional media. Next, we focus our 
attention in high dimensionality and high neighborhood order limits. High dimensionality makes distance 
distribution behavior as a delta sequence, with mean value equal to Cerf’s conjecture. Distance statistics 
in high neighborhood order converges to a Gaussian distribution. The general distance statistics can be 
applied to detect departures from Poissonian point distribution hypotheses as proposed by Thompson and 
generalized here. 
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1 Introduction 

Gonsider a d-dimensional, unbounded, isotropic and ho¬ 
mogeneous medium with disorder (points) generated by a 
Poisson process. The expected number of points in a vol¬ 
ume Vd is X = pVd^ where p is the point density. This dis¬ 
ordered medium, although unlimited, can be represented 
computationally as a d-dimensional hypercube, containing 
N coordinates randomly distributed with uniform prob¬ 
ability density function (pdf) along each edge (random 
point problem [T]). This is a possible way to construct 
a disordered medium, where, distances among the points 
are not fixed, but vary statistically. In this medium, it is 
possible to exploit the neighborhood and distance statis¬ 
tics. 

Neighborhood statistics quantifies the probability of a 
point to be the mth nearest neighbor of its nth nearest 
neighbor. For N ^ this probability was firstly calcu¬ 
lated by Clark and Evans [ 2 ], for m = n = 1 and later 
generalized by Clark for mutual neighbors m = n [3]. 
Dacey corrected expression obtained by Clark [3]. Neigh¬ 
borhood statistics was generalized by Cox, for m 7 ^ n [5] 
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and interpreted in terms of multinomial distribution by 
Tercariol et al [ 6 ]. 

Distance statistics quantifies the distance distribution 
of a given point to its kth nearest neighbor and can be 
applied in several disciplines. In Physics and Biology, this 
statistics can be used, for instance, in calculating the aver¬ 
age separation between stars [7] , aggregation in plant com¬ 
munity I3E], optimal tour for Euclidean salesman prob¬ 
lem unun], Euclidean matching problem [HITS], partially 
self-avoiding walks mun], thin films [16], etc. In Com¬ 
puter Science, the nearest neighbor distance statistics can 
be employed as pattern classifier other than be¬ 

ing used to determine distance between network termi¬ 
nals m , etc. 

Up to now, only two aspects of this problem has been 
thoroughly addressed. The first one is the distance among 
points nnum and the moments of the highest order [ 22 j 
[23] for different point distributions. The second one, is the 
distribution calculation for low-dimensional media, d < 3, 
for nearest neighbor IHlin] and arbitrary neighborhood |9] . 
The distance distribution to the kth nearest neighbor in an 
arbitrary dimension has been calculated by Martin [19] , in 
the context of distance between internet access terminals. 
Despite the mathematical expression knowledge mM, 
the parameters influence on the distribution have not been 
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fully addressed, mainly in cases of high dimension and 
neighborhood order. 

These limiting cases are non-trivial, because of the 
function ratio r{z x)/r{z), for z where r{z) is 

the gamma function. If one considers simpler expansion, 
inconsistencies like undefined central moments, such vari¬ 
ance and skewness, occur. Our main contribution is to 
correct these inconsistencies using higher order terms in 
this ratio expansion. We calculate the distribution in these 
limiting cases and proof important results as the ones con¬ 
jectured by Cerf et al. m- 

In this paper, we obtain the analytical expressions for 
high dimensionality distance statistics that leads to an 
equivalence of the random link model. Also, the high neigh¬ 
borhood order is addressed, the resultant distribution con¬ 
verges to a Gaussian due to central limit theorem. The dis¬ 
tance statistics can be used not only for predicting separa¬ 
tion between neighbors, but also to detect departures from 
Poissonian hypothesis, as proposed by Thompson [9] and 
generalized here. Furthermore, we expand special cases of 
distance statistics varying dimensionality and neighbor¬ 
hood order. 

Our paper is organized as follows. In Sec. 2, we obtain 
the pdf of distance statistics in two distinct ways: through¬ 
out geometric interpretations and cumulative functions. 
This pdf is described by the generalized gamma distribu¬ 
tion In Sec. 3, we calculate the high neighborhood 

order /c ^ 1, and high dimension d ^ 1 limiting cases. In 
this way, we demonstrated mathematically the Cerf’s et 
al. conjecture m, and consider the combination of the 
limiting cases. In Sec. 4, we explore special cases of dis¬ 
tance statistics by varying dimension and neighborhood 
order and propose a generalized hypothesis test to quan¬ 
tify deviations from Poissonian spatial process. Finally, on 
Sec. 5, we present the conclusions. 


d/ <C /, the probability density function becomes: 



dN^ldk-l 

m 




(1) 


where k is neighborhood order and can be mapped on the 
generalized gamma distribution: 


/x\ 1/^ 

UJ J ’ 

( 2 ) 

with: P = 1/d and 0 = = [p7r^/^/T(l -b d/2)] 

which depends on the point density and medium dimen¬ 
sion. It is non-trivially affected by medium symmetry. If 
one considers a computer simulations, 0 is only affected 
by media boundaries through out p. If, in one hand, one 
considers a d-dimensional hypercube with edge length C 
and N points, then p = N/C^. In the other hand, if one 
considers a sphere: p = Nr{l + d/2)/7r^/^£^. 

Monte Carlo simulations validated Eq. [H The medium 
consisted of a cube with N points and density p = N/. 
The results of Eq. [T] applied to this limited medium is an 
approximation due to boundary effect, since points near 
the boundaries have fewer neighbors. Periodic boundary 
conditions minimize this effect. This validation is depicted 
in Eig. [U where one sees that increasing the neighbor¬ 
hood order, statistics distribution become more symmetric 
around their mean value. Moreover, the numerical exper¬ 
iments consider a finite number of points. The correction 
for finite size system is of order 1/N for the mean dis¬ 
tance [20]. Eurther, Eq. [Tjin terms of A = Nul^^ number 
of points in d-dimensional sphere of radius /, is collapsed 
into f^^\\) = \^-^e-^/r{k). 


GG{x\e,k,P) = 


1 


X\k//d 


por{k) 


(f) 


-1 


exp 


2 Statistics of the random point problem 

In this section, we obtain the analytical distance distri¬ 
bution expression and validate it by Monte Carlo sim¬ 
ulations. In addition, we collapse the data with nearest 
neighbor distance distribution. Consider a d-dimensional 
medium, with density p, where p = pf and pi is the 
one-dimensional medium density. The previous argument 
keeps the mean distance among points constant, which al¬ 
low us to compare different system dimensionalities. The 
expected number of points in a hypersphere of radius I is 
A = Nul^, where Nu = p7T^^‘^ /r{l + d/2) is the number of 
points in a d-dimensional sphere with unitary radius. The 
probability of k points to fall into a sphere of radius I is 
given by the Poisson formula, P{k) = /k\. 

The first method to derive the distance statistics is 
based on geometric arguments. The probability of k points 
to fall inside a sphere of radius l-\-dl is written as product 
of the probability of a sphere of radius I to contain k — 1 
points and the probability of the spherical shell thickness 
dl to contain only one point: {l)dl = P{k — l)P(l). As 
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Fig. 1. Comparison of the analytical results generated by 
Eq. [T] (full lines) up to the fifth neighbor in a two dimension 
medium with Monte Carlo simulation. Simulations parame¬ 
ters are p = 50000, with periodic boundary conditions. The 
increase of neighbor order makes distance statistics more sym¬ 
metric with respect to the mean value. 













Cristiano Roberto Fabri Granzotti, Alexandre Souto Martinez: Distance statistics in random media 


3 


The second method is based on the cumulative distri¬ 
bution function. Consider firstly a point i and its nearest 
neighbor, in two-dimensional medium. The probability of 
not finding any other point closer than / is P(/c = 0) = 
g-p 7 rZ random variable L, that describes the dis¬ 

tance up to point i nearest neighbor, L > / if there is 
no points in area tt/^, consequently P{L > 1) = . 

Thus, L cumulative distribution function is: P{L < 1) = 

1—P{L > 1) = (/). The pdf that describes the distance 

to first neighbor is the dF^^(l)/dl\ 

This reasoning can be extended to arbitrary neighborhood 
and dimensionality, and leads to Eq. [H 

The mean distance of a point i to its kth nearest neigh¬ 
bor is: 

(3) 

which has been firstly obtained by Percus e Martin [ 20 ] . 

(k) 

and factorizes in density and neighborhood order. The ^ 
variance is: 


\i) 


p,d,k 




- 2/3 


r{k + 2/3) 

m 


[ nk + l3) \ 

V m ) 


( 4 ) 


which is difficult to analyze, because of the [3 = l/d ratio 
in the gamma function argument. When k ^ (3, one can 
consider a simple expansion of ratio r{kp /3)/r{k) as k^ 
or k^e~^l^ ^ however the calculation of central moments 
as variance and skewness are inconsistent. Expanding to 
higher orders, for one has: 


r{z -h x) 

nz) 


z^ exp 


/ —X 3x‘^ 


( 5 ) 


According to Eq. [5l the mean and standart devia¬ 
tion of can be approximated to: ~ N~^k^ and 

o'{l)p,d,k ~ ^ with c = 3/2, indicating that 

the mean distance, in high dimensionality, is weakly af¬ 
fected by the neighborhood order, while the variance de¬ 
cays very rapidly. This occurs because the volume of a 
sphere is almost concentrated in a very thin spherical shell, 
when d ^ 1 . The skewness of Eq. [T] depends non-trivially 
on k and P 


2 - r}f(fc,/3)/l22(fc,/3) + Qf{k,/3)/Qs{k,p) 


where P) = B{k^np) / r{nP) and B{a,b) = r{a) 
r{b)/r{a b) is beta function. The skewness is modified 
only by neighborhood order and dimension, being inde¬ 
pendent of medium boundaries and density, using Eq. [5] 
it can be approximated to 71 ~ 6Pk~^^‘^. This simplifi¬ 
cation accurately describes the skewness behavior around 
the mean value due to the neighborhood order, see Eig.[Tl 
Also, the skewness factorizes in neighborhood order and 
dimensionality. 


3 Limiting cases 

In this section, we analyze the behavior of Eq. [U firstly 
in the limit d ^ 1 , next for k ^ 1 and finally both limits 
simultaneously. Although straightforward, these calcula¬ 
tions present some pitfalls which are properly stressed. 


3.1 High dimensionality 


Let us introduce a new variable y = {I — {l^p \))/cf 
which standardizes the distance by the mean separation 
between the points. As d 1 , one has ~ N~^ and 

<^p,d,i(0 ^ Distance can be rewritten as follows 

1 = N-P{l + I5cy). (7) 

with c = 3l2 and P = 1/d. In the y variable, the pdf is ob¬ 
tained from the application of probability transformation 
law to Eq. [U using I from Eq. 0 Starting with /c = 1, 
one finds the Gumbel distribution ( 1 /|A| exp[—(x/A) — 
exp(—x/A)]), with A = — 1 /c: 

g{y) = cexp[c// - exp(c//)], ( 8 ) 


which describes the minimum deviation from the expected 
separation: . Eor higher neighborhood orders, 

one has: 




—— exp[c% - exp(c 2 /)], 

r{k) 


(9) 


which is the log-gamma distribution {[l/\X\r{k)] ex.p[kx/X— 
exp(x/A)]). The mean distance among points is calcu¬ 
lated in two parts: {y) = P{k)/c = {l/c)d[\r].r{k)]/dk, 
that is digamma function m, so that the mean distance 
among points is {l^p]) = + pP{k)]. The neighbor¬ 

hood order is an integer, which leads to a representation 
k-l 

P{k) = —7 + rewriting as {k — one has 

i=l 

the mean distance on 1: 



IPP 




( 10 ) 


where 7 = 0.57721... is Euler’s constant and for /c ^ 1, 
P{k) « ln{k) and {l^p]) = pin{k)], this was firstly 

obtained by Cerf et al conjecture expanding the term 
P{k P) /P(k) of Eq. [3l This term, on average, represent 
distance increment due to neighborhood order increase. 
Due to accurate approximation for ratio r{z -h x)/r{z)^ 
we demonstrate this conjecture using distance statistics. 
Eurt her more, Eq. [9] allows us to calculate not only mean 
distance, but also variance and higher order moments. 

The variance of Eq.[9]is cr‘^{y)k = where (k) 

is trigamma function. One can argue that cr^(a + bx) = 
6 ^cr^(x), so that the standard deviation in I is 


p,d,k 


Vk 


( 11 ) 
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where we made use of the ^ l//c, for k ^ 1. 

In the I variable, the mean distance is only weakly af¬ 
fected by neighborhood order, and the variance vanishes 
rapidly. This occurs because, in high dimensionality, a 
small increase in the radius leads to a large increase in 
volume. The larger the radius is, the smaller is the incre¬ 
ment to generate the same increase in volume. Therefore, 
the higher the neighborhood order, the smaller the radius 
increase is and the lower the standard deviation is around 
the mean value. The distance distribution in variable I is 
described as a delta sequence. 


3.2 High neighborhood order 

The second limiting case is the distance distribution for 
high neighborhood order. From Eq. 01 one sees that the 
standard deviation decays according to ^ for k ^ 

p. According to the central limit theorem, the variance 
of the summation S', of independent and identically 
distributed random variables, with finite variance, a, is 
ar = (j/(S) and the skewness decreases as 1 / \^N [13] . For 
/c ^ 1 and for arbitrary dimension, the relative standard 
deviation and skewness decrease as \/y/k. This indicates 
that, besides recovering the symmetry of the pdf around 
its mean value, the neighborhood order increase makes 
Eq. [1] to converge to a Gaussian distribution. This behav¬ 
ior is obtained by numerical simulation and illustrated in 
the graphs of Eigs. [T] and [2l 


0,005 0,01 0,015 



Fig. 2. Gaussian approximation for distance statistics for k ^ 
1 and two dimensional media. Simulation parameters are d = 2, 
p — 65365 and k — 10. The weak adjustment on the tails is 
due the fact that the ends of the distribution converge more 
slowly than the peak. 

The convergence to the Gaussian is due to the summa¬ 
tion of volumes. The necessary volume around a point i to 
find k neighbors is on average /cVi, where Vi is the volume 
needed to find the nearest neighbor, distance in this case 
is a random variable proportional to (kVi)^. Another way 


of understanding this convergence is considering the sum¬ 
mation of the thicknesses of spherical shells comprising 
the same volume. 


4 Applications 

In this section, we discuss possible applications of our re¬ 
sults in context of pseudo number generation and tests 
of departures from Poissonian hypothesis in spatial dis¬ 
tributions. Table ([T]) enumerates various pdfs obtained by 
varying dimensionality of the medium and neighbor order 
in Eq. [H 

Due to the great amount of special cases of distance 
statistics. Table ([T]), one possible application is to use it 
as a very general pseudo random numbers generator. Al¬ 
though not efficient in terms of time consumption, it al¬ 
lows one to have a unified probability density functions 
arising from distance measurements in random media. 

Table 1. Summary of probability distributions obtained from 
Eq. □ for different dimensionalities and neighborhood orders. 
The symbol (-) means arbitrary value, oo is a high value and 
(*) means distribution in y variable given by EqUi 


d 

k 

Distribution 

1 

1 

Exponential 

1 

- 

Gamma 

1 

oo 

Gaussian 

2 

1 

Rayleigh 

2 

- 

Nakagami 

3 

- 

Wilson-Hilferty 

- 

1 

Weibull 

- 

- 

Stacy 

- 

oo 

Gaussian 

(X) 

1 

*Gumbel 

oo 

- 

* Log-Gama 

oo 

oo 

* Gaussian 


Another possible application of Eq. [T] is to evaluate, 
whether given distances among points vary from Poisso¬ 
nian hypothesis. This evaluation was originally employed 
by Thompson [9] , on the distance distribution among trees 
in a two dimensional environment. One way to assess de¬ 
viations from this hypothesis is to perform a test of signifi¬ 
cance for the average distance to the kth nearest neighbor. 
The test uses limits of Eq. (TJ when it is transformed into 
a (chi square) distribution, Eq. [T2j As a generalization 
of Thompson result, we propose the same test in an envi¬ 
ronment of arbitrary dimensionality. Rewriting Eq. [T] as a 
function of Xn = 2A, it becomes: 

^ (f)' (12) 

that is distribution, with 2k degrees of freedom. Once 
one knows the density of points on media, it is possible to 
apply the test and detect deviations in any neighborhood 
order, not only for the nearest one. 
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5 Conclusion 

Using only Poisson process, we calculate the statistical dis¬ 
tribution of distance for disordered media with arbitrary 
dimensionality. Our results have been validated by Monte 
Carlo simulations. Starting with Eq. [T] we calculate the 
limiting case of high dimensionality and high neighbor¬ 
hood order. Distance statistics on high dimensional case 
becomes a delta sequence around the mean distance, that 
was firstly conjectured by Cerf et al. Distance statistics in 
high neighborhood order converges to a Gaussian distri¬ 
bution due to central limit theorem. The general pdf with 
d < 3 and arbitrary neighborhood order leads to special 
cases that retrieves well known pdfs such gamma, Weibull, 
etc. Distance statistics may detect departures from Pois- 
soanian, as pointed by Thomson for d = 2, and generalized 
by Eq. [HI opening up new possibilities like three dimen¬ 
sional image analyzes of cells distribution, etc. 
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q(305738/2010-0 and 485155/2013-3) and GAPES. The au¬ 
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